智能论文笔记

Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge

Alef Iury Siqueira Ferreira , Gustavo dos Reis Oliveira

分类：自然语言处理

2022-07-29

本文提出了我们为在葡萄牙语中自发和准备的语音和语音情感识别的共享任务自动语音识别（SE＆R 2022）的共同任务自动语音识别的努力。挑战的目的是考虑葡萄牙语的ASR研究，考虑到不同方言的准备和自发语音。我们的方法包括在域特异性方法中微调ASR模型，应用增益归一化和选择性噪声插入。提出的方法比可用的4个曲目中的3个曲目中提供的强大基线改进了

translated by 谷歌翻译

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Marília Costa Rosendo Silva , Felipe Alves Siqueira , João Pedro Mantovani Tarrega , João Vitor Pataca Beinotti , Augusto Sousa Nunes , Miguel de Mattos Gardini , Vinícius Adolfo Pereira da Silva , Nádia Félix Felipe da Silva , André Carlos Ponce de Leon Ferreira de Carvalho

分类：机器学习 | 自然语言处理 | (统计)机器学习

2022-08-02

使用机器学习算法从未标记的文本中提取知识可能很复杂。文档分类和信息检索是两个应用程序，可以从无监督的学习（例如文本聚类和主题建模）中受益，包括探索性数据分析。但是，无监督的学习范式提出了可重复性问题。初始化可能会导致可变性，具体取决于机器学习算法。此外，关于群集几何形状，扭曲可能会产生误导。在原因中，异常值和异常的存在可能是决定因素。尽管初始化和异常问题与文本群集和主题建模相关，但作者并未找到对它们的深入分析。这项调查提供了这些亚地区的系统文献综述（2011-2022），并提出了共同的术语，因为类似的程序具有不同的术语。作者描述了研究机会，趋势和开放问题。附录总结了与审查的作品直接或间接相关的文本矢量化，分解和聚类算法的理论背景。

translated by 谷歌翻译

Out-Of-Distribution Detection Is Not All You Need

Joris Guérin , Kevin Delmas , Raul Sena Ferreira , Jérémie Guiochet

分类：机器学习 | 人工智能 | 计算机视觉

2022-11-29

The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-ofmodel-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.

translated by 谷歌翻译

Toward Human-AI Co-creation to Accelerate Material Discovery

Dmitry Zubarev , Carlos Raoni Mendes , Emilio Vital Brazil , Renato Cerqueira , Kristin Schmidt , Vinicius Segura , Juliana Jansen Ferreira , Dan Sanders

分类：机器学习 | 人工智能

2022-11-05

There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sustainable energy systems, pandemics, among others. In certain domains like chemistry, scientific discovery carries the extra burden of assessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and efficient way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, generative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In this work, we propose a workbench framework that aims at enabling the human-AI co-creation to reduce the time until the first discovery and the opportunity costs involved. This framework relies on a knowledge base with domain and process knowledge, and user-interaction components to acquire knowledge and advise the SMEs. Currently,the framework supports four main activities: generative modeling, dataset triage, molecule adjudication, and risk assessment.

translated by 谷歌翻译

Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts

Stefano Pini , Christian S. Perone , Aayush Ahuja , Ana Sofia Rufino Ferreira , Moritz Niendorf , Sergey Zagoruyko

分类：机器人 | 机器学习

2022-11-03

The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathnet

translated by 谷歌翻译

Optimizing Crop Management with Reinforcement Learning and Imitation Learning

Ran Tao , Pan Zhao , Jing Wu , Nicolas F. Martin , Matthew T. Harrison , Carla Ferreira , Zahra Kalantari , Naira Hovakimyan

分类：人工智能 | 机器学习

2022-09-20

农作物管理，包括氮（N）受精和灌溉管理，对农作物产量，经济利润和环境产生了重大影响。尽管存在管理指南，但要在特定的种植环境和农作物中找到最佳的管理实践是挑战。先前的工作使用加强学习（RL）和作物模拟器来解决该问题，但是训练有素的政策要么具有有限的性能，要么在现实世界中不可部署。在本文中，我们提出了一种智能作物管理系统，该系统通过RL，模仿学习（IL）同时优化N受精和灌溉，并使用农业技术决策系统（DSSAT）进行了作物模拟。我们首先使用Deep RL，尤其是Deep Q-Network来培训需要从模拟器中的所有状态信息作为观测值（表示为完整观察）的管理政策。然后，我们援引IL来培训管理政策，这些政策只需要有限的国家信息，这些信息可以通过模仿以前的RL训练有素的政策在全面观察中轻松获得的国家（表示为部分观察）。我们在佛罗里达州使用玉米的案例研究进行实验，并将受过训练的政策与玉米管理指南进行比较。我们在全面观察和部分观察中训练有素的政策取得了更好的结果，从而获得更高的利润或类似的利润，而环境影响较小。此外，部分观察管理政策在使用易于使用的信息时直接在现实世界中部署。

translated by 谷歌翻译

AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier

Lars Heiliger , Zdravko Marinov , André Ferreira , Jana Fragemann , Jacob Murray , David Kersting , Rainer Stiefelhagen , Jens Kleesiek

分类：计算机视觉

2022-09-02

随着时间的流逝，肿瘤体积和肿瘤特征的变化是癌症治疗的重要生物标志物。在这种情况下，FDG-PET/CT扫描通常用于癌症的分期和重新分期，因为放射性标记的荧光脱氧葡萄糖在高代谢的地区进行了。不幸的是，这些具有高代谢的区域不是针对肿瘤的特异性，也可以代表正常功能器官，炎症或感染的生理吸收，在这些扫描中使详细且可靠的肿瘤分割成为一项苛刻的任务。 AUTOPET挑战赛解决了这一研究差距，该挑战提供了来自900名患者的FDG-PET/CT扫描的公共数据集，以鼓励该领域进一步改善。我们对这一挑战的贡献是由两个最先进的分割模型组成的合奏，即NN-UNET和SWIN UNETR，并以最大强度投影分类器的形式增强，该分类器的作用像是门控机制。如果它预测了病变的存在，则两种分割都是通过晚期融合方法组合的。我们的解决方案在我们的交叉验证中诊断出患有肺癌，黑色素瘤和淋巴瘤的患者的骰子得分为72.12 \％。代码：https：//github.com/heiligerl/autopet_submission

translated by 谷歌翻译

HTML版本

Tree-Based Adaptive Model Learning

Tiago Ferreira , Gerco van Heerdt , Alexandra Silva

分类：机器学习

2022-08-31

我们将Kearns-Vazirani学习算法扩展到能够处理随时间变化的系统。我们提出了一种新的学习算法，该算法可以重复使用和更新以前学习的行为，在Learnlib库中实现它，并在大型示例中对其进行评估，我们在算法的两次运行中进行了少量调整。在这些实验中，我们的算法显着优于经典的Kearns-Vazirani学习算法和当前最新的自适应算法。

translated by 谷歌翻译

Unifying Evaluation of Machine Learning Safety Monitors

Joris Guerin , Raul Sena Ferreira , Kevin Delmas , Jérémie Guiochet

分类：机器学习 | 人工智能 | 计算机视觉 | 机器人

2022-08-31

随着机器学习（ML）在关键自主系统中的越来越多的使用，已经开发出运行时监视器来检测预测错误并使系统在操作过程中保持安全状态。已经提出了针对涉及各种感知任务和ML模型的不同应用，并将监视器进行了监视，并将特定的评估程序和指标用于不同的环境。本文介绍了三个统一面向安全的指标，代表了监视器的安全益处（安全增益），使用后的剩余安全差距（残留危险）以及对系统性能（可用性成本）的负面影响。要计算这些指标，需要定义两个返回功能，代表给定的ML预测如何影响预期的未来奖励和危害。三个用例（分类，无人机登陆和自动驾驶）用于证明如何根据建议的指标来表示文献的指标。这些示例的实验结果表明，不同的评估选择如何影响监视器的感知性能。由于我们的形式主义要求我们制定明确的安全假设，因此它使我们能够确保进行评估与高级系统要求符合。

translated by 谷歌翻译

An Evolutionary Approach for Creating of Diverse Classifier Ensembles

Alvaro R. Ferreira Jr , Fabio A. Faria , Gustavo Carneiro , Vinicius V. de Melo

分类：计算机视觉

2022-08-23

分类是数据挖掘和机器学习领域中研究最多的任务之一，并且已经提出了文献中的许多作品来解决分类问题，以解决多个知识领域，例如医学，生物学，安全性和遥感。由于没有单个分类器可以为各种应用程序取得最佳结果，因此，一个很好的选择是采用分类器融合策略。分类器融合方法成功的关键点是属于合奏的分类器之间多样性和准确性的结合。借助文献中可用的大量分类模型，一个挑战是选择最终分类系统的最合适的分类器，从而产生了分类器选择策略的需求。我们通过基于一个称为CIF-E（分类器，初始化，健身函数和进化算法）的四步协议的分类器选择和融合的框架来解决这一点。我们按照提出的CIF-E协议实施和评估24种各种集合方法，并能够找到最准确的方法。在文献中最佳方法和许多其他基线中，还进行了比较分析。该实验表明，基于单变量分布算法（UMDA）的拟议进化方法可以超越许多著名的UCI数据集中最新的文献方法。

translated by 谷歌翻译